Detection-based ASR in the automatic speech attribute transcription project
نویسندگان
چکیده
We present methods of detector design in the Automatic Speech Attribute Transcription project. This paper details the results of a student-led, cross-site collaboration between Georgia Institute of Technology, The Ohio State University and Rutgers University. The work reported in this paper describes and evaluates the detection-based ASR paradigm and discusses phonetic attribute classes, methods of detecting framewise phonetic attributes and methods of combining attribute detectors for ASR. We use Multi-Layer Perceptrons, Hidden Markov Models and Support Vector Machines to compute confidence scores for several prescribed sets of phonetic attribute classes. We use Conditional Random Fields (CRFs) and knowledge-based rescoring of phone lattices to combine framewise detection scores for continuous phone recognition on the TIMIT database. With CRFs, we achieve a phone accuracy of 70.63%, outperforming the baseline and enhanced HMM systems, by incorporating all of the attribute detectors discussed in the paper.
منابع مشابه
An overview on automatic speech attribute transcription (ASAT)
Automatic Speech Attribute Transcription (ASAT), an ITR project sponsored under the NSF grant (IIS-04-27113), is a cross-institute effort involving Georgia Institute of Technology, The Ohio State University, University of California at Berkeley, and Rutgers University. This project approaches speech recognition from a more linguistic perspective: unlike traditional ASR systems, humans detect ac...
متن کاملOptimizing Data Selection for Automatic Speech Recognition in Low Resource Languages
Developing Automatic Speech Recognition (ASR) systems for low resource languages is a labor-, computation-, and timeintensive task. Data selection techniques seek highly informative subsets of speech data for transcription and can lead to considerable reduction in time and expense for transcription and ASR training. This project investigates unsupervised and supervised data selection techniques...
متن کاملStacked auto-encoder for ASR error detection and word error rate prediction
Recently, Stacked Auto-Encoders (SAE) have been successfully used for learning imbalanced datasets. In this paper, for the first time, we propose to use a Neural Network classifier furnished by an SAE structure for detecting the errors made by a strong Automatic Speech Recognition (ASR) system. Error detection on an automatic transcription provided by a ”strong” ASR system, i.e. exhibiting a sm...
متن کاملAn Information-Extraction Approach to Speech Analysis and Processing
It is believed that the current capabilities of the state-of-the-art automatic speech recognition (ASR) technologies can be further enhanced by analysis and processing techniques that can take advantage of the full set of acoustic and linguistic information existing at various levels of the speech knowledge hierarchy. This calls for a bottom-up knowledge integration framework that links speech ...
متن کاملExploiting Untranscribed Broadcast Data for Improved Code-Switching Detection
We have recently presented an automatic speech recognition (ASR) system operating on Frisian-Dutch code-switched speech. This type of speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we extend this work by using some raw broadcast data to improve multilingually trained deep neural networks (DNN) that have been trained on 11.5 ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007